815 research outputs found

    Document Clustering with K-tree

    Get PDF
    This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    K-tree: Large Scale Document Clustering

    Get PDF
    We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.Comment: 2 pages, SIGIR 200

    Random Indexing K-tree

    Get PDF
    Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

    Molecular Line Profile Fitting with Analytic Radiative Transfer Models

    Full text link
    We present a study of analytic models of starless cores whose line profiles have ``infall asymmetry,'' or blue-skewed shapes indicative of contracting motions. We compare the ability of two types of analytical radiative transfer models to reproduce the line profiles and infall speeds of centrally condensed starless cores whose infall speeds are spatially constant and range between 0 and 0.2 km s-1. The model line profiles of HCO+ (J=1-0) and HCO+ (J=3-2) are produced by a self-consistent Monte Carlo radiative transfer code. The analytic models assume that the excitation temperature in the front of the cloud is either constant (``two-layer'' model) or increases inward as a linear function of optical depth (``hill'' model). Each analytic model is matched to the line profile by rapid least-squares fitting. The blue-asymmetric line profiles with two peaks, or with a blue shifted peak and a red shifted shoulder, can be well fit by the ``HILL5'' model (a five parameter version of the hill model), with an RMS error of 0.02 km s-1. A peak signal to noise ratio of at least 30 in the molecular line observations is required for performing these analytic radiative transfer fits to the line profiles.Comment: 48 pages, 20 figures, accepted for publication in Ap

    The Spitzer c2d Survey Of Nearby Dense Cores. XI. Infrared And Submillimeter Observations Of CB130

    Get PDF
    We present new observations of the CB130 region composed of three separate cores. Using the Spitzer Space Telescope, we detected a Class 0 and a Class II object in one of these, CB130-1. The observed photometric data from Spitzer and ground-based telescopes are used to establish the physical parameters of the Class 0 object. Spectral energy distribution fitting with a radiative transfer model shows that the luminosity of the Class 0 object is 0.14-0.16 L-circle dot, which is low for a protostellar object. In order to constrain the chemical characteristics of the core having the low-luminosity object, we compare our molecular line observations to models of lines including abundance variations. We tested both ad hoc step function abundance models and a series of self-consistent chemical evolution models. In the chemical evolution models, we consider a continuous accretion model and an episodic accretion model to explore how variable luminosity affects the chemistry. The step function abundance models can match observed lines reasonably well. The best-fitting chemical evolution model requires episodic accretion and the formation of CO2 ice from CO ice during the low-luminosity periods. This process removes C from the gas phase, providing a much improved fit to the observed gas-phase molecular lines and the CO2 ice absorption feature. Based on the chemical model result, the low luminosity of CB130-1 is explained better as a quiescent stage between episodic accretion bursts rather than being at the first hydrostatic core stage.NASA 1224608, 1288664, 1407, NNX07AJ72G, 1279198, 1288806, 1342425NSF AST-0607793, AST-0708158Korea government (MEST) 2009-0062866Ministry of Education, Science and Technology 2010-0008704Astronom

    The Spitzer c2d Survey of Nearby Dense Cores: VI. The Protostars of Lynds Dark Nebula 1221

    Get PDF
    Observations of Lynds Dark Nebula 1221 from the Spitzer Space Telescope are presented. These data show three candidate protostars towards L1221, only two of which were previously known. The infrared observations also show signatures of outflowing material, an interpretation which is also supported by radio observations with the Very Large Array. In addition, molecular line maps from the Five College Radio Astronomy Observatory are shown. One-dimensional dust continuum modelling of two of these protostars, IRS1 and IRS3, is described. These models show two distinctly different protostars forming in very similar environments. IRS1 shows a higher luminosity and larger inner radius of the envelope than IRS3. The disparity could be caused by a difference in age or mass, orientation of outflow cavities, or the impact of a binary in the IRS1 core.Comment: accepted for publication in Ap

    Gene expression profiling of epithelium-associated FcRL4(+) B cells in primary Sjogren's syndrome reveals a pathogenic signature

    Get PDF
    In primary Sjögren's syndrome (pSS), FcRL4+ B cells are present in inflamed salivary gland tissue, within or in close proximity to ductal epithelium. FcRL4 is also expressed by nearly all pSS-related mucosa-associated lymphoid tissue (MALT) B cell lymphomas, linking FcRL4 expression to lymphomagenesis. Whether glandular FcRL4+ B cells are pathogenic, how these cells originate, and how they functionally differ from FcRL4- B cells in pSS is unclear. This study aimed to investigate the phenotype and function of FcRL4+ B cells in the periphery and parotid gland tissue of patients with pSS. First, circulating FcRL4+ B cells from 44 pSS and 54 non-SS-sicca patients were analyzed by flow cytometry. Additionally, RNA sequencing of FcRL4+ B cells sorted from parotid gland cell suspensions of 6 pSS patients was performed. B cells were sorted from cell suspensions as mini bulk (5 cells/well) based on the following definitions: CD19+CD27-FcRL4- ('naive'), CD19+CD27+FcRL4- ('memory'), and CD19+FcRL4+ B cells. We found that, although FcRL4+ B cells were not enriched in blood in pSS compared with non-SS sicca patients, these cells generally exhibited a pro-inflammatory phenotype. Genes coding for CD11c (ITGAX), T-bet (TBX21), TACI (TNFRSF13B), Src tyrosine kinases and NF-κB pathway-related genes were, among others, significantly upregulated in glandular FcRL4+ B cells versus FcRL4- B cells. Pathway analysis showed upregulation of B cell activation, cell cycle and metabolic pathways. Thus, FcRL4+ B cells in pSS exhibit many characteristics of chronically activated, pro-inflammatory B cells and their gene expression profile suggests increased risk of lymphomagenesis. We postulate that these cells contribute significantly to the epithelial damage seen in the glandular tissue and that FcRL4+ B cells are an important treatment target in pSS

    Advanced model compounds for understanding acid-catalyzed lignin depolymerization : identification of renewable aromatics and a lignin-derived solvent

    Get PDF
    This work was funded by the EP/J018139/1, EP/K00445X/1 grants (NJW and PCJK), an EPSRC Doctoral Prize Fellowship (CSL), and the European Union (Marie Curie ITN ‘SuBiCat’ PITN-GA-2013-607044, CWL, NJW, PCJK, PJD, KB, JdeV).The development of fundamentally new approaches for lignin depolymerization is challenged by the complexity of this aromatic biopolymer. While overly simplified model compounds often lack relevance to the chemistry of lignin, the direct use of lignin streams poses significant analytical challenges to methodology development. Ideally, new methods should be tested on model compounds that are complex enough to mirror the structural diversity in lignin but still of sufficiently low molecular weight to enable facile analysis. In this contribution, we present a new class of advanced (β-O-4)-(β-5) dilinkage models that are highly realistic representations of a lignin fragment. Together with selected β-O-4, β-5, and β–β structures, these compounds provide a detailed understanding of the reactivity of various types of lignin linkages in acid catalysis in conjunction with stabilization of reactive intermediates using ethylene glycol. The use of these new models has allowed for identification of novel reaction pathways and intermediates and led to the characterization of new dimeric products in subsequent lignin depolymerization studies. The excellent correlation between model and lignin experiments highlights the relevance of this new class of model compounds for broader use in catalysis studies. Only by understanding the reactivity of the linkages in lignin at this level of detail can fully optimized lignin depolymerization strategies be developed.PostprintPeer reviewe

    Toxicity Weighting for Human Biomonitoring Mixture Risk Assessment: A Proof of Concept

    Get PDF
    Chemical mixture risk assessment has, in the past, primarily focused on exposures quantified in the external environment. Assessing health risks using human biomonitoring (HBM) data provides information on the internal concentration, from which a dose can be derived, of chemicals to which human populations are exposed. This study describes a proof of concept for conducting mixture risk assessment with HBM data, using the population-representative German Environmental Survey (GerES) V as a case study. We first attempted to identify groups of correlated biomarkers (also known as 'communities', reflecting co-occurrence patterns of chemicals) using a network analysis approach ( n = 515 individuals) on 51 chemical substances in urine. The underlying question is whether the combined body burden of multiple chemicals is of potential health concern. If so, subsequent questions are which chemicals and which co-occurrence patterns are driving the potential health risks. To address this, a biomonitoring hazard index was developed by summing over hazard quotients, where each biomarker concentration was weighted (divided) by the associated HBM health-based guidance value (HBM-HBGV, HBM value or equivalent). Altogether, for 17 out of the 51 substances, health-based guidance values were available. If the hazard index was higher than 1, then the community was considered of potential health concern and should be evaluated further. Overall, seven communities were identified in the GerES V data. Of the five mixture communities where a hazard index was calculated, the highest hazard community contained N-Acetyl-S-(2-carbamoyl-ethyl)cysteine (AAMA), but this was the only biomarker for which a guidance value was available. Of the other four communities, one included the phthalate metabolites mono-isobutyl phthalate (MiBP) and mono-n-butyl phthalate (MnBP) with high hazard quotients, which led to hazard indices that exceed the value of one in 5.8% of the participants included in the GerES V study. This biological index method can put forward communities of co-occurrence patterns of chemicals on a population level that need further assessment in toxicology or health effects studies. Future mixture risk assessment using HBM data will benefit from additional HBM health-based guidance values based on population studies. Additionally, accounting for different biomonitoring matrices would provide a wider range of exposures. Future hazard index analyses could also take a common mode of action approach, rather than the more agnostic and non-specific approach we have taken in this proof of concept
    corecore